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1.  Introduction 


We  shall  consider  the  frequently  met  problem  of  linear  extrapolation  of  the 
stationary  random  processes  x(s),  —  ac  <  s  <  & ,  with  Ex(s)  =  0.  The  problem 
consists  of  finding  that  linear  functional  x(t;  t),  t  >  0,  of  the  values  x(s)  for 
s  <  t  (extrapolation  according  to  the  entire  past  of  the  process)  or  for  t  —  T  < 
s  <  t  (extrapolation  of  a  process  given  on  a  finite  interval)  which  would  give 
the  best  approximation  to  the  random  variable  x(t  -f-  r).  “Best”  here  is  intended 
in  the  sense  of  least-squares;  that  is,  it  is  required  of  the  functional  x(t;  r)  that 
the  mean-square  prediction  error 

(1)  <r2(r)  =  E\x(t  +  r)  —  x (/;  r)|2 
takes  on  its  minimum  value. 

A.  N.  Kolmogorov  [1],  [2]  initiated  the  theory  of  linear  least-squares  ex¬ 
trapolation  of  stationary  processes.  This  theory  was  developed  further  by  M.  G- 
Kreln  [3],  N.  Wiener  [4],  K.  Karhunen  [5],  and  others.  At  present,  it  has 
achieved  a  significant  degree  of  completion  (see,  for  example,  Doob  [6],  chap¬ 
ter  XII,  or  Rozanov  [7]).  We  may  formulate  the  general  solution  of  this  problem 
in  the  following  way. 

Let  us  start  from  the  spectral  representation  of  the  stationary  stochastic 
process  in  the  form 

(2)  x(s)  =  J  cux  dZ(\) 


where  Z{\)  is  the  stochastic  measure  on  the  —  =c  <  X  <  oc  axis.  This  measure 
is  connected  to  the  spectral  function  F(\)  of  the  process  x(s)  by  the  relationship 

(3)  E{fsdZ(X)-jSiim}  =  [snSidF(X)} 

where  the  bar  above  the  symbol  signifies  the  complex  conjugate.  If  F'(\)  is 
zero  on  a  set  of  nonzero  Lebesgue  measure,  or  if  F'( X)  is  not  zero  almost  every¬ 
where  but 


(d) 


[log  F'(\)  1 

1  +  X2 


<l\  =  3C  f 


—  ’X) 
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then  the  best  linear  extrapolator  £(t;  r)  agrees  almost  surely  with  x(t  +  r);  in 
other  words,  in  this  case  o-(r)  =  0  for  all  r.  If  the  integral  in  the  left-hand  side 
of  (4)  converges,  then 

(5)  £(<;  r)  =  £  e*<x4>r(X)  dZ{\) 

where  4>T(X)  =  0  for  X  G  S,  and 


(6) 


<t>T(X)  =  — 1 /  e~ipK  dp  /  ei{p+r)u<p(u)  du 
27T<p(X)  Jo  J 


for  X  <£  S. 


Here  S  is  a  set  of  zero  Lebesgue  measure  consisting  of  the  discontinuities  of 
F(\)  and  of  the  growth  points  of  the  singular  component  of  F(X),  and  <p(\)  is 
defined  by  the  condition  <p(\)  =  limM  j  o  <p(X  —  ip)  for  almost  all  X,  where 


(7) 


<p(w)  =  exp 


i  +  xwiogF’W)d>r 

X  —  w  1  +  X2  J 


The  function  <p (w)  is  analytic  and  has  no  zeroes  in  the  lower  half-plane  of  com¬ 
plex  variable  w,  and  its  boundary  value  on  the  real  axis  </>(X)  satisfies  the  condi¬ 
tion  |<p(X)|2  =  F'(X)  almost  everywhere. 

The  function  <t>r(X)  is  called  the  spectral  characteristic  for  linear  extrapolation. 
When  the  analytic  expression  for  this  function  is  known,  it  is  also  usually  pos¬ 
sible  to  give  an  explicit  expression  for  the  best  extrapolator  f  (<;  r).  In  fact,  from 
(5)  and  (2)  we  have 

(8)  x(t ;  r)  =  f*  x(t  -  p)w{p)  dp , 


where  w(p )  is  the  generalized  function  (a  Schwartz  distribution)  which  is  the 
Fourier  transformation  of  the  function  $T(X) : 


(9) 


w(p)  =  J  eipX$>T(\)  d\,  4>r(X)  =  j  e~ipXw(p)  dp. 


The  mean-square  extrapolation  error  is  expressed  in  terms  of  the  spectral  char¬ 
acteristic  of  the  extrapolation  by  using  the  formula 

(10)  (72(t)  =  |^>X  -  <t>r(X)|2  dF(X)  =  E\x(t)\ 2  -  |<t>T(X)|2  dF(\). 


In  a  number  of  cases  the  function  4>T(X)  may  also  be  found  without  using  the 
complicated  formulas  (6)  and  (7).  Thus,  for  example,  in  the  case  of  an  absolutely 
continuous  spectral  function  F(X),  it  is  easy  to  show  that  if  there  exists  a  func¬ 
tion  J/  of  the  real  variable  X  such  that  \J/  (a)  belongs  to  the  space  L2(dF)  ( has  an 
integrable  square  modulus  in  the  measure  F'(\)  d\),  (b)  may  be  continued  analyt¬ 
ically  in  the  lower  half-plane  so  that  there  it  will  not  have  an  order  of  growth  higher 
than  a  power  of  |X|,  and  (c)  satisfies  the  condition  that  [eirX  —  ^(X)]F'(X)  may  be 
continued  analytically  in  the  upper  half -plane  so  that  it  will  fall  off  not  slower 
than  a  power  of  |  X  j  at  infinity,  then  \p  will  indeed  be  the  spectral  characteristic 
4>r(X)  (see  [8]). 

A  general  solution  for  the  problem  of  the  best  least-squares  linear  extrapola- 


LINEAR  EXTRAPOLATION 


261 


tion  of  a  stationary  process  x(s )  (such  as  given  by  formulas  (5)-(7))  cannot  be 
obtained  by  means  of  values  on  the  finite  interval  t  —  T  <  s  <  t.  However, 
some  sufficient  conditions  similar  to  the  conditions  (a),  (b),  and  (c)  presented 
above,  which  permit  the  direct  selection  of  the  spectral  extrapolation  char¬ 
acteristic  <fv(X)  in  several  special  cases,  may  be  formulated  for  this  case  too. 
For  example,  let  us  suppose  that  the  nondecreasing  hounded  function  F(X)  is  abso¬ 
lutely  continuous  and  that  there  exists  a  function  \f/  such  that  \p{&r)  belongs  to  L-{dF), 
(by)  it  is  an  entire  function  of  complex  variable  X  of  the  form  \p(X)  =  X^  =  i  e~Ut>i/k(X) 
where  r  is  an  integer,  0  <  sk  <  T  for  all  k  and  all  ^*(X)  are  rational  functions, 
and  (c t)  satisfies  the  condition  that  [elrX  —  \p(X)}F'(\)  may  be  represented  in  the 
form  <pi(X)  +  e~iTX<p2(X)  where  ipi(X)  may  be  continued  analytically  in  the  upper 
half -plane  and  <p2(X)  may  be  continued  analytically  in  the  lower  half -plane  so  that 
both  functions  will  fall  off  in  the  corresponding  half -planes  not  slower  than  a  power 
of  \X\.  Then  it  is  possible  to  show  that  \ p(X)  will  indeed  be  the  spectral  characteristic 
for  the  linear  extrapolation  of  the  stationary  process  x{s)  with  the  spectrum  F(X) 
in  terms  of  the  values  x(s)  on  the  interval  t  —  T  <  s  <  t  (see  [9]). 

2.  Explicit  expressions  for  the  best  extrapolator 

The  general  case  of  an  arbitrary  stationary  process  was  considered  in  the 
Kolmogorov  [1],  [2]  and  Krein  [3]  works  on  the  theory  of  extrapolation. 
However,  since  it  is  impossible  to  give  any  uniquely  defined  “most  natural” 
representation  for  the  functional  £(t;  r )  in  the  general  case,  the  problem  of 
finding  this  extrapolator  was  not  even  posed  in  the  works  mentioned  above,  and 
all  attention  was  turned  to  finding  an  expression  for  the  mean-square  extrapola¬ 
tion  error  o-2(r)  and,  especially,  to  the  clarification  of  the  conditions  under  which 
<r2(r)  =  0  or,  conversely,  <r2(r)  ^  0.  Wiener’s  great  contribution  was  that  he  was 
the  first  to  direct  attention  to  the  possibility  of  obtaining  very  simple  and 
convenient  explicit  expression  for  the  best  extrapolator  x(t;  r)  in  some  particular 
cases.  Namely,  in  his  book  [4]  Wiener  examined  the  case  of  stationary  processes 
with  an  absolutely  continuous  spectral  function  F(X),  and  an  everywhere  positive 
rational  spectral  density  F'(X).  The  general  form  of  such  a  spectral  density  is 
given  by 

M  2 

n  (x  -  &) 

(11)  F'(X)=B^ - - - -2»  —oo  <  X  <  oo, 

n  (x  -  aj) 

y=i 

where  B  >  0,  N  >  M  and  the  imaginary  parts  of  all  roots  aj  and  are  positive. 
It  is  easy  to  verify  that  in  the  case  of  (11)  the  function  <p(w)  in  formula  (7) 
acquires  the  extremely  simple  form: 

M 

_  n  (w  - 13 k) 

fW  =  VJ3  - 

II  (w  —  af) 
y — i 


(12) 


2(32 
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Substituting  this  expression  into  (0)  (for  the  S  =  0  case),  Wiener  obtained  an 
explicit  expression  for  <hr(^)  in  the  form 

(13)  -MX)  =  — ,  MX)  -'e'ca*-'-1, 

II  (x  -  ft) 

k=  1 

where  the  coefficients  C}  -  Cj(t)  (dependent  on  r)  are  determined  from  a  simple 
algebraic  system  of  N  linear  equat  ions.  The  same  result  is  obtained  even  more 
simply  by  starting  from  the  sufficient  conditions  (a),  (b),  and  (c)  defining  «f>T(X) 
mentioned  at  the  end  of  section  1.  The  extrapolator  x(t;  r)  of  the  form 

(14)  x(l;  r)  =  E  BjXU){t)  +  E  Bx-u-i+k  [“  e^xit  -  p)  dp, 

where  the  coefficients  B0,  •  •  •  ,  By- 1  are  linear  combinations  of  the  C0,  •  •  •  ,  GN-i, 
corresponds  to  the  spectral  characteristic  (13)  under  the  condition  that  the 
are  different.  When  multiple  roots  (5  k  =  (5k+i  =  •  •  •  =  /3/,+2  exist  among  the  roots 
(5 1,  •  •  •  ,  Pm,  the  weight  functions  e^kP,  •  •  •  ,  in  (14)  must  be  replaced  by 
e*0*p,  pe^*p,  •  •  •  ,  pz~xeitikZ.  All  these  results  are  widely  known  at  present  and  may 
be  found  in  several  advanced  mathematical  and  engineering  texts. 

It  is  far  less  known,  however,  that  there  are  many  examples  of  processes  with 
irrational  spectral  density  F'(\ )  for  which  the  explicit  formula  for  the  best 
linear  extrapolator  is  no  more  complicated  than  in  the  case  of  a  rational  spectral 
density.  Apparently,  the  author  gave  one  of  the  first  of  such  examples  around 
ten  years  ago  in  [10].  The  question  of  the  least-square  extrapolation  of  stochastic 
processes  with  spectral  density  of  the  form  F'(\)  =  AX~“,  —  =o  <  X  <  °c,  was 
considered  there.  Clearly,  such  a  function  F'(X)  may  not  be  spectral  density  of  a 
stationary  stochastic  process  x(s)  since  it  is  nonintegrable.  Nevertheless,  the 
function  F'(X)  =  AX~“  for  A  >  0  and  a  >  1  is  the  spectral  density  of  a  sto¬ 
chastic  process  x(s)  with  stationary  increments  of  some  order,  and  the  whole 
theory  of  the  linear  extrapolation  of  stationary  stochastic  processes  is  extended 
without  difficulty  to  such  processes.  In  particular,  formula  (6)  is  only  slightly 
changed  when  applied  to  processes  with  stationary  increments.  It  is  further¬ 
more  easy  to  show  that  for  F'(\)  =  AX~a  the  function  <p(w)  in  (7)  will  be  given 
by  <p(w)  =  VAw-*1'-  where  w  —  |io|e‘*,  0  >  9  >  —  x. 

Substituting  this  value  of  <p(w)  in  the  appropriately  modified  formula  (6),  we 
find  an  analytic  expression  first  for  the  spectral  extrapolation  characteristic 
4>r(X),  and  then  for  the  best  extrapolator  £•(/;  r).  For  example,  if  1  <  a  <  2,  it 
can  thereby  be  shown  that  the  best  extrapolator  here  has  the  form 


do) 


x(t;  r) 


-  sL  rfp. 
rn(v  +  r)  p> 


and  if  2  <  a  <  3,  then 


LINEAR  EXTRAPOLATION 


263 


(15') 


x(t]  t)  =  x(t)  + 


TTOi 


x(t  —  p)  —  x(t) 
pa/2(p  +  r) 


dp. 


In  both  these  cases,  the  process  x(s)  is  a  process  with  stationaiy  first  increments 
which  has  the  structure  function  E\x(t  +  s)  —  .r(.s)|2  of  the  form 


(10)  D(t)  =  K\ x(t  +  .s)  -  x(s) i2  =  1)  =  -  a)  sin 


Kolmogorov  [11]  first  considered  such  stochastic  processes;  it  later  turned  out 
that  they  play  an  essential  part  in  the  statistical  theory  of  turbulent  flows 
(see,  for  example,  [12]). 

The  method  applied  in  [10]  may  even  be  used  to  solve  problems  on  the 
extrapolation  of  some  stationary  stochastic  processes  with  irrational  spectra] 
density.  For  example,  let  the  spectral  density  of  a  stationary  random  process 
x(s)  be 

(17)  F'{\)  =  A  (X2  +  a2)-2, 

where  a  >  1,  A  >  0,  a  >  0.  In  this  case,  the  covariance  function  H (t)  of  the 
process  .r(.s)  is 

.  B(t)  =  Exit  +  s)x(s)  =  7)|/|<a_l)  2A',«_i)/2(rt|/|), 

J  1)  =  \/T2-<‘‘-s>'■2a-(«-,,  2 [r(2or) ]— 1 , 

where  Kv  is  the  so-called  Basset’s  function  (the  modi  tied  Bessel  function  of  the 
second  kind).  The  function  <p(w)  in  (7)  has  the  form  <p(w)  =  VA(w-  +  a-)~a  2 
in  this  case,  where  the  argument  6  of  the  complex  number  w-  +  a 2  = 
(X  —  in )2  +  a-  is  assumed  to  satisfy  the  inequality  0  >  9  >  —2i r.  Moreover, 
repeating  the  reasoning  in  [10]  which  results  in  formulas  (lf>)  and  (15'),  we 
find  that 


(19)  x(t)  T ) 


Sill 


op 

pa/  {p  +  r) 


x(t  —  p)  dp 


for  1  <  a  <  2, 


8111  Y  fx  r  e~a" 

-  e-*rTa/ 2  /  - - -  x(t  -  p) 

.  7T  Jo  LP  +  T 


for  2  <  a  <  4 


(analogous  formulas  for  x(t;  r)  may  also  be  obtained  for  a  >  4).  More  complex 
results  of  the  same  kind,  referring  to  the  problem  of  the  extrapolation  of  homo¬ 
geneous  and  isotropic  stochastic  fields  x (h,  /•_>,  •  •  •  ,  tH)  with  a  spectral  density  of 
the  form  (17),  in  terms  of  their  values  in  the  t„  <  0  half-space  may  be  found 
in  the  'Fort us  work  [13]. 

Still  another  class  of  stationary  stochastic  processes  with  irrational  spectral 
density  for  which  an  explicit  formula  may  be  written  for  the  best  extrapolator 
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is  the  class  of  processes  x(s)  with  spectral  density  expressed  in  terms  of  polyno¬ 
mials  in  X  and  trigonometric  functions  in  X  as  follows : 

I  n  (x  -  ft)  n  (i  + 

(20)  F'(X)  =  B  - , 

I  n  (X  -  a})  n  (1  +  ane~^) |2 

j  =  1  n  =  1 

where  B  >  0,  N  >  M,  the  imaginary  parts  of  all  aj  and  ft  are  positive  and 
a„,  bm,  y„,  and  are  real  numbers  such  that  y„  >0,  8m  >  0,  |a„|  <  1,  |6m|  <  1 
for  all  n  and  m.  It  is  easy  to  verify  that  in  this  case 

M  K 

_  U  (w  -  ft)  n  (1  +  bme~iSmW) 

(21)  v(w)  =  VB  *-£ - - - - 

II  (w  —  otj)  II  (1  +  ane~iynW) 

j  =  1  71  =  1 

Substituting  this  expression  for  <p(w)  in  (6)  (with  S  =  0),  we  may  obtain 
after  some  analytical  manipulations  an  explicit  formula  for  the  spectral  extrap¬ 
olation  characteristic  4>T(X)  and  then  for  the  extrapolator  x(t;  r)  also.  The  same 
result  may  be  obtained  more  simply  by  direct  selection  of  the  function  <f>T(X) 
satisfying  the  conditions  (a),  (b),  and  (c)  mentioned  at  the  end  of  section  1; 
that  is,  by  using  the  method  developed  in  my  book  [8]  to  solve  problems  on 
linear  extrapolation  for  the  case  of  a  rational  spectral  density  F'(\).  Finally, 
it  is  also  possible  to  use  here  the  fact  that  under  condition  (20),  the  difference 
equation 

(22)  n  [>(s)  +  anx(s  -  7»)]  =  n  [?/(s)  +  bmy(s  -  ft,)] 


will  have  the  solution  y(s),  which  is  a  stationary  stochastic  process  with  rational 
spectral  density  of  the  form  (11),  such  that  Hx  (0  =  Hy  ( t )  (here  H *  ( t )  denotes 
the  linear  span  of  the  set  of  stochastic  variables  x(s),  s  <  t,  which  is  closed 
relative  to  mean-square  convergence)  and 

(22')  n  [£(t]  T )  +  a„x(t ;  r  —  7*)]  =  II  [y(t;  r)  +  bmy{t;  r  —  ft,)] 

71  =  1  771  =  1 


(the  last  approach  has  been  recently  developed  in  his  candidate  dissertation  by 
S.  Grigoryev  at  Kazan  University  for  the  cases  K  =  1,  L  =  0  and  K  =  0, 
L  =  1).  In  the  particular  case  where 


/oox  I7//X\  _  P  I1  +  be  ax|2  _  p  (1  +  b2)  +  2b  cos  5X 

(23)  F(X)-B  |x_.aj!  -B - - 

where  B,  b,  a,  and  8  are  real  parameters,  B  >  0,  a  >  0,  8  >  0,  |6|  <  1,  each 
of  the  three  methods  we  have  described  leads  to  the  formula 


*r(X) 


' e~° "  +  be~i(S~T)X 
1  +  be~ax 

g-«T(l 

„  1  +  be~ 


for 

T  <  8, 

for 

T  >  8. 

(24) 
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It  follows  that 
(25)  x(t;  t) 


£  (-1  )kbk\r(t  -  k'8)  -  £  (-1  YbKiit  +  r  -  kS) 
k  =  0  k  =  1 


for  t  <  8, 


r(l  +  bea&)  £  (— 1  )kbkx(t 
k  —  0 


kS) 


for 


>  5 


(the  result  (25)  for  the  case  b  =  —e~a5  has  been  published  by  Grigoryev  [14], 
who  used  a  more  artificial  method  in  his  paper).  If,  for  example, 


B 


B 


(26)  F'{\)  |X  _  ia^x  ae-iy\\2  (X2  a2)(1  _|_  a2  _|_  2a  cos  y\)' 

where  B  >  0,  a  >  0,  y  >  0,  a  is  real  and  |aj  <  1,  then 
e-ar[l  _  (  — 1  Yareray] 


(27) 


4>r(X)  = 


1  +  aeay 

for  (r  —  l)y  <  r  <  ry,  which  means  that  for  such  r 
’[I  -  (  —  l)rareray] 


(1  +  ae~iyX)  +  (  —  l)rare~i(-ry-T)X 


(28)  x(t;r)  = 


1  +  aeay 


[a*(0  +  ax(t  -  7)] 


+  (  — l)rara:(^  +  r  —  ry). 


As  is  seen  from  these  examples  for  specific  spectral  densities  of  the  form  (20), 
explicit  formulas  for  the  best  extrapolator  turn  out  to  be  no  more  complex  than 
for  rational  spectral  densities  of  the  form  (11),  which  contain  the  same  number 
of  factors  in  the  numerator  and  denominator.  However,  the  form  of  the  extrap- 
olators  in  these  cases  differs  considerably  from  the  forms  of  the  extrapolators 
for  rational  spectral  densities. 

Generally,  it  is  considerably  more  difficult  to  find  an  explicit  expression  for 
the  best  extrapolator  x (t;  t)  for  the  best  least-square  linear  extrapolation  in 
terms  of  the  values  x (s)  on  the  finite  interval  t  —  T  <  s  <  t  than  for  extrapola¬ 
tion  in  terms  of  its  values  on  the  half-axis  s  <  t.  However,  in  the  particular 
case  of  a  rational  spectral  density  of  the  form  (11)  (w^here  the  numerator  may 
even  vanish;  that  is,  the  imaginary  parts  of  some  /3’s  may  be  equal  to  zero), 
this  expression  may  also  be  effectively  determined  (for  example,  by  using  direct 
selection  of  characteristics  4>r(X)  satisfying  the  conditions  ( ar ),  (&r),  aod  (cr) 
of  section  1  or  by  some  other  similar  method;  see,  for  example,  [15],  [7],  [9]). 
It  turns  out  that  in  this  case  the  extrapolator  $(£;  r)  has  the  form 

W-M-l  N-M-l 

x(t;  r)  =  £  Bkx(k)(t)  +  £  BN-M-kX{k)(t  —  T ) 

k= 0  &=0 

M  r  f 

+  £  B2N-2M-i+k  I  ei0kPx(t  -  p)  dp 

k=  1 

+  £  Bzx-m-  1+k  fT  <fikPx(t  -  p)  dp, 
k  =  1 


(29) 
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where  B{),  liu  •  ■  •  ,  B2x~  1  are  r-dependent  coefficients  determined  from  some 
system  of  2N  linear  equations  (for  simplicity,  we  consider  all  the  roots  f}u  •  •  •  , 
to  be  different).  It  is,  however,  essential  that  N  of  these  linear  equations  be 
homogeneous  equations  not  containing  the  parameter  r;  hence,  only  AT  of  the 
2N  coefficients  Bq,  •••  ,  7i2.v-i  are  independent.  Therefore,  for  any  r  the  best 
extrapolator  x(t;  r)  may  be  represented  as  the  sum  of  N  definite  linear  combina¬ 
tions  of  values  and  derivatives  of  the  process  x(s)  at  the  points  s  =  t  and  s  = 
t  —  T  and  of  integrals  of  x(t  —  p),  0  <  p  <  T,  with  the  weight  functions  ei0kP 
and  e^kP,  where  every  combination  is  multiplied  by  some  r-dependent  coefficients. 

The  conditions  {ar),  (J>r),  and  (cr)  may  also  be  applied  to  finding  the  explicit 
expression  for  the  best  extrapolator  x(l;  r)  in  terms  of  the  values  x(s)  for  t  — 
T  <  s  <  t  in  the  case  of  more  general  spectral  densities  of  the  form  (20)  (where 
the  imaginary  part  of  some  jS's  may  even  be  zero  and  some  b’s  may  be  equal 
to  -f  1  or  —1).  For  special  cases  where  either  K  =  1,  L  =  0,  or  K  —  0,  L  =  1, 
the  expression  for  x(t;  r)  has  recently  been  obtained  by  this  method  by  Grigoryev 
in  his  dissertation  (the  results  for  the  spectral  density  (22)  where  b  =  —e~a& 
were  published  in  [14]).  In  the  general  case  K  —  1,  L  —  0  the  best  extrapolator 
for  r  >  <5i  consists  of  the  integral  term  and  the  linear  combination  of  the  values 
and  derivatives  of  the  process  x(s)  in  the  points  of  the  form  t  —  j8i  and  t  — 
T  +  j&\,  j  =  0,  1,  •  •  ,  belonging  to  the  interval  [/  —  T,  7’];  the  extrapolator 
for  t  <  5i  contains  additionally  the  values  and  the  derivatives  of  the  process 
in  the  points  of  the  form  /  +  r  —  j8h  j  =  1,  2,  •  •  •  .  In  the  case  where  K  =  0, 
L  —  1,  the  best  extrapolator  x(t;  r)  contains  the  integral  term,  the  values,  and 
the  derivatives  of  the  process  at  the  points  t,  t  —  T,  t  —  71,  t  —  T  +  yh  belong¬ 
ing  to  the  interval  [/  —  T,  /],  and  the  value  of  the  process  at  the  point  t  -f  r  —  ryi, 
where  (r  —  l)7i  <  r  <  ryi. 

For  the  process  with  stationary  increments  having  the  structure  function  (16) 
and  the  spectral  density  F'( X)  =  /lA~a,  1  <  a  <  2,  it  is  also  possible  to  obtain 
the  explicit  expression  for  the  best  extrapolator  in  terms  of  the  values  of  the 
process  on  the  interval  t  —  T  <  s  <  t  (Klein  [10],  [17],  (Irigoryev).  According 
to  Grigoryev,  the  extrapolator  x (t;  r)  for  1  <  a  <  2  has  the  form 


sn: 


(30) 


x(f;  t)  = 


ira 

AT 


[t(T  +  t)]“  2 


/  1  fit  -  v ) 

■  1 

./<)  [p(V’  -  p)]a  2 

_P  +  T  C  - 

(]p, 


where  r  =  (2(«  —  1)  'aT)F(l,  a;  1  +  a/2;  —  r  T)  an.l  F(a,  b;  r;  2)  is  the  usual 
symbol  for  the  hypergeometric  series. 


3.  Simplified  linear  extrapolators.  The  use  of  the  decomposition  of  the 
random  process  into  the  principal  components 

The  problem  of  finding  the  explicit  expressions  for  the  best  linear  extrapolator 
is  an  interesting,  purely  analytical  problem.  However,  the  solutions  of  the 
problem  are  rarely  used  in  practice,  as  they  are  usually  not  simple  enough. 
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Besides,  even  when  the  explicit  expressions  for  the  best  extrapolator  are  used, 
they  are  often  not  the  best  in  reality.  The  derivation  of  the  expression  for 
£(t;  t)  requires  the  knowledge  of  the  precise  form  of  the  spectral  (or  covariance) 
function  of  the  process,  and  (he  spectrum  or  covariance  which  is  used  in  many 
cases  is  only  an  approximation  to  the  precisely  unknown  or  too  complicated 
true  function  F{\)  or  B(t). 

From  the  theoretical  point  of  view,  the  use  of  (he  best  extrapolators  cor¬ 
responding  to  (he  approximate  expression  for  the  covariance  function  or  the 
spectrum  seems  not  to  be  justifiable.  There  are  some  special  examples  where 
the  best  extrapolation  becomes  meaningless  (for  example,  as  it  contains  non- 
existing  derivatives)  or  very  far  from  being  optimal  after  very  small  changes 
of  the  functions  B(t)  and  F(\). 

(In  the  special  case  where  it  is  known  that  the  approximation  F\(A)  to  the 
true  spectrum  F( A)  which  is  used  has  the  property  that  the  difference  Fi( A)  — 
F( A)  is  itself  the  spectral  function,  the  situation  is  simpler.  In  this  case  the  best 
linear  extrapolator  (t;  r)  which  corresponds  to  the  spectrum  Fi( A)  can  obviously 
be  applied  to  the  process  x(s)  with  the  spectrum  F( A).  It  is  also  easy  to  show 
that  if  Fi( A)  —  F( A)  is  a  spectral  function  and  maxx  [Fi(A)  —  F(A)]  is  small 
enough,  the  error  of  the  extrapolator  xt (t;  t)  will  be  quite  close  to  the  error  of 
the  best  linear  extrapolator  x(t;  r)  for  all  r  (see  Rozanov  [18]).) 

However,  in  almost  all  practical  applications  the  use  of  the  best  extrapolators 
corresponding  to  the  rather  rough  approximations  to  covariance  or  spectrum 
functions  as  a  rule  leads  only  to  a  very  small  excess  of  root-mean  square  error 
of  extrapolation  over  the  root-mean  square  error  of  the  true  best  linear  extrap¬ 
olator.  But  the  excess  of  root-mean  square  error  over  its  minimum  value  a(r) 
will  also  be  usually  very  small  for  many  linear  extrapolators  of  different  forms. 
Therefore,  in  many  cases  it  is  possible  to  fix  beforehand  a  form  of  the  extrapolator 
containing  a  few  undetermined  parameters  and  to  select  only  the  values  of  the 
parameters  from  the  condition  of  minimization  of  mean  square  error.  From  this 
point  of  view  the  most  interesting  result  of  the  theory  of  linear  extrapolation  is 
the  evaluation  of  the  minimum  value  of  mean  square  error.  The  knowledge  of 
this  irremovable  mean  square  error  of  extrapolation  permits  us  to  make  sure 
that  the  selected  simplified  extrapolator  cannot  be  significantly  improved. 

One  of  the  simplest  possible  extrapolators  is  evidently  the  following: 

(:U)  x  (t;r)  =  a(r)x(t). 

Root-mean  square  error  of  the  extrapolator  (31)  will  have  the  minimum  value 
<n(r)  =  {R(0)[1  —  R2(r)/R2(0)]}  1/2  when  a(r)  =  B{t)/B( 0).  In  the  case  of  the 
convex  covariance  function  B(t),  the  error  o\ (r)  can  be  compared  with  the  root- 
mean  square  error  <r(r)  of  the  best  linear  extrapolator  with  the  help  of  H&jek’s 
result  [19].  According  to  this  result,  if  B(l)  is  a  convex  function,  then  o-(r)  > 
(R(0)[1  —  B(t)/B( 0)]},/2.  It  follows  that  for  the  convex  function  B(t )  the 
error  erffr)  exceeds  <r(r)  by  no  more  than  the  factor  [1  -f-  B(r)/B( 0)]1/2  (that 
is,  by  no  more  than  50%).  Hajek’s  estimation  for  <r(r)  is  sharp  (it  is 
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attained  exactly  when  B(t)  =  max  {1  —  |/|,  0}) ;  however,  for  many  individual 
covariances  it  appears  to  be  rather  rough.  As  to  the  nonconvex  covariances  B(t), 
there  is  no  general  estimation  of  the  ratio  <x\  (r)/o-(r)  (since  it  is  possible  that 
<t{t)  —  0  and  <ji(t )  >  0).  Nevertheless,  even  for  the  nonconvex  covariances  met 
in  applications,  the  value  <ti(t)  is  often  surprisingly  near  <r(r).  For  example,  if 
B(t)  =  cos  at,  then  max,  o-i(t)/<7-(t)  1.01  (that  is,  tri(r)  exceeds  cr(r)  by 

no  more  than  1%  for  all  r).  The  ratio  o-i(t)/<t(t)  takes  somewhat  larger  values 
in  cases  where  the  function  B(t)  is  twice  differentiable,  and  the  best  extrapolator 
x(t;  t)  contains  values  of  the  derivatives  of  x(s)  at  the  point  s  =  t.  However, 
even  in  these  cases  the  replacement  of  the  best  linear  extrapolator  by  the  best 
extrapolator  of  the  form  (31)  has  in  many  practical  cases  sufficient  accuracy. 

A  still  better  approximation  to  the  minimum  value  of  the  root-mean  square 
error  of  extrapolation  can  be  attained  using  two-term  extrapolators  of  the  form 

(32)  £(*;r)  =  a(r)x(t)  +  ai(T)x(t  -  U). 

When  h  is  fixed,  the  optimal  values  of  the  coefficients  a(r)  and  cti(r)  are  deter¬ 
mined  from  the  simple  system  of  two  linear  regression  equations.  Determination 
of  the  optimal  value  of  t\  in  equation  (32)  is  a  complicated  mathemat  ical  problem 
having,  in  some  cases,  no  solution.  (For  example,  if  B(t)  =  Ce~a'l'(l  +  a  1*1), 
then  the  root-mean  square  error  of  the  extrapolator  (32)  will  decrease  with  the 
decreasing  of  t\  tending  to  the  root-mean  square  error  of  the  best  linear  ex¬ 
trapolator  as  ti  — >  0.)  However,  by  means  of  two  or  three  tests,  in  almost  all  cases 
it  is  easy  to  select  a  value  t\  such  that  the  root-mean  square  of  the  extrapolator 

(32)  will  exceed  the  root-mean  square  of  the  best  linear  extrapolator  no  more 
than  by  a  few  percent.  If  still  greater  accuracy  is  required,  it  is  possible  to  use 
an  extrapolator  x(t;  r)  having  the  form  of  a  linear  combination  of  three  values 
x(s )  at  the  points  s  <  t. 

Note  that  in  the  case  of  extrapolat  ion  of  a  mult  idimensional  stat  ionary  random 
process  (that  is,  of  an  homogeneous  random  field)  the  number  of  terms  in  the 
right-hand  part  of  equation  (32)  necessary  to  attain  accuracy  of  extrapolation 
close  to  the  one  of  the  best  linear  extrapolator  appears  to  be  markedly  greater 
than  in  the  one-dimensional  case.  For  some  special  cases  of  extrapolation  of  a 
two-dimensional  process  x{t\,  t2)  in  terms  of  its  values  in  the  half-plane  /2  <  0, 
it  was  shown  by  Fortus  [20]  that  a  good  approximation  to  the  root-mean  square 
error  of  the  best  linear  extrapolator  can  be  attained  only  by  means  of  the  linear 
combination  of  several  known  values  of  the  process  containing  no  less  than 
ten  terms. 

One  can  also  find  in  the  scientific  literature  a  great  number  of  functionals 
different  from  linear  combinations  of  some  values  o-(s)  at  s  <  t  used  as  extrap¬ 
olators  x(t;  r).  For  example,  Yudin  suggested  in  [21]  to  extrapolate  the  process 
£(s)  with  stationary  increments  and  with  structure  function  (16)  by  the  mean 
arithmetical  moving  average  of  the  form 

(33)  xa(t;r )  =  -  x(t  —  p)  dp, 

a  Jo 


LINEAR  EXTRAPOLATION 


269 


where  the  best  value  a  =  aop t  is  determined  by  the  condition  of  minimization  of 
mean  square  error  o-2(r;  a)  =  E\x(t  +  r)  —  xa(t;  r)|2.  He  found  that  aopt  =  0  if 
a  >  2,  so  that  the  best  extrapolator  of  the  form  (33)  is  the  "inertial  extrapolator’’ 
x(t;  t)  =  x(t )  if  a  >  2.  However,  if  1  <  a  <  2,  then  the  ratio  aopt/V  takes  a 
finite  value  different  from  zero,  and  in  this  case,  <r(r;  aopt)  exceeds  the  root- 
mean  square  error  of  the  best  linear  extrapolator  (15)  (found  after  the  publica¬ 
tion  of  Yudin’s  paper)  by  no  more  than  10%.  The  fact  that  extrapolator  (33) 
is  of  no  use  when  a  >  2  is  the  consequence  of  the  negativeness  of  the  correlation 
coefficient  between  x(t  +  r)  —  x(t),  r  >  0,  and  x(t  —  p)  —  x(t),  p  >  0,  in  these 
cases.  It  is  clear  from  equation  (15')  that  when  2  <  a  <  3,  it  is  much  more 
reasonable  to  select  an  approximate  extrapolator  of  the  form 

(34)  xa(t;  t )  =  x(t)  +  i  /  0(0  -  x(t  -  p)]  dp 

a  Jo 

=  2x(t)  -  \  /  x(t  -  p)  dp. 
a  Jo 

If  the  mean  square  error  of  the  best  extrapolator  (34)  is  again  denoted  as 
<r2(r ;  aopt),  then  <r(r;  a0Pt)  will  also  be  very  close  to  the  root-mean  square  error 
of  the  best  linear  extrapolator  for  2  <  a  <  3. 

Sometimes  the  exponentially  weighted  moving  averages  of  the  form 

(35)  x «(0  t)  —  a  fj  e~ai>x{t  -  p)  dp 

are  also  used  for  extrapolation  (see,  for  instance,  Cox  [22],  where  the  time 
series  with  discrete  time  are  studied).  The  extrapolator  (35)  is  closely  related 
to  (33) ;  in  the  case  where  the  value  a  =  aopt  is  determined  from  the  root-mean 
square  criterion,  its  root-mean  square  error  will  in  many  cases  only  be  slightly 
in  excess  of  the  minimum  value  of  such  an  error.  For  the  cases  when  the  extrap¬ 
olator  (35)  is  not  good  enough,  Cox  [22]  suggested  the  use  of  an  extrapolator 
of  the  form 

(36)  Xa,b(t]  r)  =  bx(t)  +  a(l  —  b)  Jq  e~ap  x(t  —p)  dp. 

The  last  extrapolator  contains  two  parameters,  a  and  b,  the  values  of  which 
can  be  determined  by  minimization  of  the  mean  square  error. 

All  extrapolators  (31)-(36)  are  linear  combinations  with  variable  coefficients 
of  some  fixed  simple  linear  functionals  of  the  past  of  the  process.  It  is  also 
possible  to  use  a  linear  combination  of  functionals,  selected  not  because  of  its 
simplicity  but  for  particular  theoretical  reasons.  For  example,  it  seems  reasonable 
to  select  the  functionals  involved  in  the  extrapolator  by  a  method  based  on  the 
general  analysis  into  principal  components.  The  analysis  was  introduced  by 
Hotelling  [23]  at  the  beginning  of  the  1930’s  for  finite  families  of  random  var¬ 
iables  and  is,  at  present,  the  widely  used  method  of  multivariate  statistical 
analysis  (see,  for  example,  Anderson  [24],  chapter  11).  Its  generalization  to  the 
case  of  the  continuous  family  of  random  variables  (to  the  part  of  the  continuous 
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random  process)  was  later  obtained  independently  by  several  scientists  (see  [25] 
to  [29]).  The  analysis  begins  by  extracting  the  first  principal  component.  The 
component  is  the  normalized  linear  combination  or  the  normalized  linear  func¬ 
tional  of  the  given  random  variables  having  maximum  variability  (that  is,  max¬ 
imum  variance).  The  word  normalized  means  that  the  sum  of  squares  of  the 
coefficients  or  the  integral  of  the  square  of  weight  function  is  one.  Then  the 
second  normalized  linear  combination  or  normalized  linear  functional  is  sought. 
It  is  uncorrelated  with  the  first  one  and  has  maximum  variance  among  all  those 
which  are  uncorrelated  with  the  first  principal  component,  and  so  on.  Dealing 
with  the  statistical  problem  concerned  with  the  given  family  of  random  var¬ 
iables,  it  is  natural  to  find  the  approximate  solution  which  depends  only  on  a 
few  first  principal  components  (supposing  that  the  other  components  with  small 
variability  cannot  change  the  solution  significantly).  During  the  last  years  this 
approach  was  often  suggested  for  practical  statistical  extrapolation  (see,  for 
instance,  Pugachev  [30]  and  Lorenz  [31]). 

The  principal  components  of  the  part  of  the  stationary  random  process  x(s), 
t  —  T  <  s  <  t,  with  covariance  function  B(t),  are  the  Fourier  coefficients  of  the 
process  corresponding  to  the  orthogonal  set  of  eigenfunctions  of  the  integral 
equation 

(37)  A  [l_T  B(s  —  si)^(si)  dsi  =  <p(s),  t  —  T  <  s  <  t 

The  variance  of  the  component 

(38)  IF,  =  jj  :r(t  -  p)<pk(t  -  p)  dp,  j*_T  |yA-(s)l2  da  =  1, 

is  equal  to  A*  ',  where  A k  is  the  corresponding  eigenvalue  of  the  equation  (37). 
The  contribution  of  the  principal  component  IF*  to  the  best  linear  extrapolator 
x(t;  r)  is  equal  to 

(39)  Xk(t',  t )  =  E\_x(t  +  r) TFa] •  11  k  —  Jt_T  ~b  T  —  Si)ifk(si)  dsi-TIT. 

The  sum  of  all  contributions  xk(t;  r),  k  =  1,  2,  •  •  •  ,  is  evidently  equal  to  the 
best  linear  extrapolator  in  terms  of  the  values  x(s)  for  t  —  T  <  s  <  t  (cf. 
Grenander  [32],  p.  269).  So  it  is  natural  to  expect  that  the  sum  of  a  few  first 
terms  Xk(t;  r),  with  smallest  indices  k  corresponding  to  the  smallest  eigenvalues 
A k,  will  form  a  good  approximation  to  the  best  extrapolator  x(t ;  r). 

However,  the  true  situation  does  not  coincide  with  the  expected  one.  Let  us 
consider  the  typical  case  of  rational  spectral  density  (11).  It  is  possible  to  show 
that  in  this  case  the  integral  equation  (37)  is  equivalent  to  the  eigenvalue  prob¬ 
lem  for  the  differential  equation 

(40)  n  (-£  +  a?) „(.)  =  2t\ Bn  (-53  +  6) *(•) 

with  the  special  boundary  conditions  at  the  points  s  =  t  and  s  =  t  —  T  (see  [33]). 
This  statement  leads  to  the  conclusion  that  the  eigenvalues  A k  in  the  rational 
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spectral  density  case  are  the  roots  of  some  transcendental  equation  accessible 
to  numerical  analysis.  The  corresponding  eigenfunctions  <Pk(s)  have  simple 
analytical  expressions  which  involve  the  parameter  A*.  In  the  simplest  case  of 
the  Ornstein-Uhlenbeck  process  with  covariance  function  R(t)  =  Cr the 
transcendental  equation  for  A*  has  the  form 

(41)  (J.~a  ,  tanV^r’aA  -  a-T  =  1, 

V2Ca\  -  ci- 

and  the  functions  <pk(s)  are  proportional  either  to  cos  V 2('ct\ *  —  a-  (s  —  t  +  7/2) 
or  to  sin  V2CaA*  —  or  (s  —  t  -f  T/2).  These  results  allow  one  to  compute  easily 
the  root-mean  square  error  <ti(t)  of  the  extrapolator  .fi(/;  r)  for  the  Ornstein- 
Uhlenbeck  process  x(s),  t  —  T  <  s  <  t.  If,  for  example,  ar  =  0.1,  it  appears 
that  <Ji(r)  ~  1.3cr(r)  for  aT  =  I,  (Ti (r)  1.7o<r(r)  for  aT  =  1,  and  <Ti (r)  ~ 

2.2 <r(r)  for  a 7’  =  3,  where  cr(r)  is  the  root-mean  square  error  of  the  best  linear 
extrapolator  x(t;r).  Similarly,  if  ar  =  0.2,  then  <ri(r)  ^  l.loo-(r)  for  aT  —  J, 
<ri (r)  ^  1.4(r(r)  for  a7’  =  1,  and  o-i(r)  ;^r  l.Oo-(r)  for  a 7’  =  3.  Therefore,  the 
extrapolator  .fi (/;  r)  involving  only  the  first  principal  component  is,  in  this  case, 
satisfactory  for  a  short  interval  T  (and  a  not  too  small  r)  but  very  inaccurate 
for  a  long  interval  T.  The  next  approximations  -r*(f;  r),  n  =  2,  3,  •  •  •  , 
behave  the  same  way,  and  consequently,  in  order  to  obtain  a  good  approxima¬ 
tion  to  x(t]r)  for  a  long  enough  7’,  it  is  necessary  to  use  a  large  number  of 
principal  components  IT*. 

This  phenomenon  may  be  explained  by  the  fact  that  the  values  x(s)  in  the 
beginning  and  in  the  end  of  the  interval  t  —  T  <  s  <  t  contribute  equally  to 
the  principal  components,  whereas  the  last  known  values  of  the  process  are  much 
more  important  for  the  extrapolation  than  the  earliest  ones.  It  is  also  clear  that 
the  Ornstein-Uhlenbeck  process  is  the  least  suitable  for  the  extrapolation  by 
means  of  principal  components  because  all  the  information  about  the  future  <f 
such  a  process  is  contained  in  its  last  known  value  x(t).  However,  in  all  other 
cases  the  best  extrapolator  will  also  be  dependent  mainly  on  the  values  x(s)  in 
the  neighborhood  of  the  point  s  =  t.  Therefore,  for  the  extrapolator  determinated 
by  a  fixed  number  of  the  first  principal  components  IF*,  the  accuracy  of  extrap¬ 
olation  must  decrease  when  the  length  of  the  interval  of  known  values  x(s), 
and  consequently,  the  known  information,  is  increasing.  This  proves  that  the 
application  of  the  method  of  principal  components  to  extrapolation  problems 
with  long  intervals  T  is  not  advisable. 

4.  Theory  of  canonical  correlations  for  stationary  random  processes 

The  decomposition  into  the  principal  components  is  not  convenient  for  the 
extrapolation  because  it  is  based  on  the  selection  of  the  functionals  containing 
maximum  total  information  (that  is,  maximum  variability),  whereas  only  the 
specific  information  about  the  future  values  of  the  process  is  of  interest  for 
extrapolation.  The  method  of  statistical  analysis  being  most  suitable  for  the 
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study  of  interdependencies  and  interrelations  between  two  families  of  random 
variables  is  the  method  of  canonical  correlations.  Therefore,  it  is  interesting  to 
investigate  the  application  of  this  method  to  statistical  extrapolation.  The  the¬ 
ory  of  canonical  correlations  was  developed  in  the  middle  of  the  1930’s  independ¬ 
ently  by  Hotelling  [34]  and  by  Oboukhov  [35],  [36]  (see  also  Anderson  [24] } 
chapter  12). 

According  to  the  theory,  the  investigation  of  the  interrelations  of  the  families 
x  =  (xi,  •  ■  •  ,  x„)  and  y  =  (yh  •  •  •  ,  ym)  begins  by  finding  out  the  normalized 
linear  combinations  U i  =  Y”  anx,  and  Fi  =  Yi  fin Vj  having  maximum  cor¬ 
relation  coefficients  pi.  Then  the  second  linear  combinations  U2  =  Yi  ornXi  and 
V2  —  Ya  fiftM i  are  sought.  They  are  uncorrelated  with  the  first  ones  and  have 
maximum  correlation  coefficient  p2  among  all  those  which  are  uncorrelated  with 
the  first  ones,  and  so  on.  As  a  result  one  manages  to  select  coordinate  systems 
in  the  spaces  of  variables  z’s  and  y’s  such  that  all  the  components  of  the  com¬ 
pound  vector  (Ui,  U2,  •••  ,  Un,  Fi,  F2,  •  •  •  ,  Vm)  (where  and  Vj  are  the 
components  of  x  and  y  in  a  new  coordinate  system)  appear  to  be  pairwise 
uncorrelated  with  the  exception  of  the  pairs  (Ui,  F,),  i  =  1,  2,  •  •  •  ,  l  where 
l  <  min  (n,  m). 

One  can  show  that  the  canonical  variables  £4  =  Y  —  «(x  and  Vk  — 
2Z  fiftV i  —  fi'kY  and  the  canonical  correlations  pk  =  X  are  determined  by  the 
following  algebraic  eigenvalue  problem : 

(42)  —  \(#>xxa.  -j-  (SSXyfi  =  0,  (B yiCt  X(RV(//3  =  0, 

where  <Bxst,  ($>yx,  and  <BW  are  the  cm  responding  covariance  matrices. 

The  method  of  obtaining  the  values  l\,  •  •  •  ,  U i,  V i,  •  •  •  ,  V i,  and  pi,  •  •  •  ,  pi 
can  also  be  described  purely  geometrically.  Let  us  consider  the  multidimen¬ 
sional  space  Hx,y  of  all  linear  combinations  w  =  Y"  +  Y?  fijyj  with  the 
usual  scalar  product  (wh  w2)  =  Evc\U'i.  Let  (Px  be  the  matrix  of  projection  in 
Hx,y  on  the  linear  subspace  Hx  consisting  of  linear  combinations  of  the  form 
otiXi,  and  let  (Py  be  the  matrix  of  projection  to  the  subspace  Hy  of  combina¬ 
tions  Y™  fijyn  In  this  case  the  correlations  pi,  •  •  •  ,  pi  will  coincide  with  the  non¬ 
zero  eigenvalues  of  the  matrix  <BX  =  (Px(Py(Px  (or  the  matrix  (Ry  —  (?y(?x(? y).  The 
variables  Uh  •  •  •  ,  Ui  and  Vh  •  ,  Fi  will  be  eigenvectors  of  the  matrices  ($>x 
and  (B;/  corresponding  to  the  eigenvalues  pi,  •  •  •  ,  pi. 

It  is  clear  that  in  the  case  where  the  variables  (a-i,  •  •  •  ,  x„,  yi,  •  •  •  ,  ym )  have 
a  multivariate  Gaussian  distribution,  all  the  information  about  the  vector  x  con¬ 
tained  in  the  vector  y  is  fully  characterized  by  the  values  of  the  canonical  cor¬ 
relations  pi,  •  •  •  ,  pi.  Using  the  known  Shannon’s  formula  it  is  easy  to  calculate 
that  in  the  case  considered  the  amount  of  information  about  y  contained  in  x 
is  equal  to  -(1/2)  Y\  log  (1  -  pi)  (ef.  [33]).  In  the  course  of  evaluating  the 
amount  of  information  about  a  Gaussian  random  process  contained  in  another 
random  process,  the  theory  of  canonical  correlations  was  generalized  by  Gelfand 
and  Yaglom  [33]  to  the  case  of  twro  infinite  families  of  random  variables  (that 
is  to  the  case  of  two  random  processes  (ar(s),  s  G  £}  and  {y(t),  t  e  T}) .  If  S 
and  T  are  two  intervals  of  a  real  axis  (which  can  coincide  vrith  each  other), 
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the  determination  of  canonical  correlations  and  canonical  variables  for  {.r($)} 
and  {y(t)}  can  formally  be  reduced  to  the  solution  of  the  eigenvalue  problem 
(related  to  (42)) 

(43)  -  X  fr  B„U,  t'Ml')  *'  +  fs  »')\K*0  </*'  =  0,  IsT, 

fT  BU*.  t 'Ml')  <>T  -  X  fs  /?„(«,  <?*'  =  0,  #  G  K 

where  Bxx,  Bxu,  Byx,  and  Byu  are  the  covariance  functions  and  the  cross-covariance 
functions.  However,  the  eigenfunctions  of  the  problem,  as  a  rule,  are  generalized 
functions  (for  example,  they  can  contain  the  5-function  and  its  derivatives;  cf. 
the  similar  situation  in  the  paper  [15]  devoted  to  extrapolation  and  filtering). 
Therefore,  the  mathematically  rigorous  presentation  of  the  theory  of  canonical 
correlations  for  random  processes  can  be  developed  more  easily  by  basing  it  on 
the  geometrical  interpretation  of  the  theory.  This  interpretation  can  be  extended 
1o  an  infinite  dimensional  case  without  any  changes  with  the  exception  of  the 
fact  t hat  the  matrices  (Px,  (Py  and  ffix,  turn  out  to  be  operators  in  the  Hilbert 
space  (see  [33],  [37]  and  related  purely  geometrical  papers  [38],  [39]). 

The  papers  [33],  [40]  deal  with  the  case  where  S  and  T  are  the  same  interval 
of  the  time  axis  —  =c  <  $  <  <x>  and  where  y(s)  =  x(s)  -f  z(s),  with  x(s)  and  z(s) 
mutually  uncorrelated  stationary  random  processes  with  rational  spectral  den¬ 
sities.  Under  the  additional  assumption  that  the  spectral  density  of  z(s)  falls 
off  at  infinity  faster  than  the  spectral  density  of  x(s),  the  evaluation  of  the 
canonical  correlations  for  this  case  can  be  reduced  first  to  some  eigenvalue  prob¬ 
lem  for  a  linear  differential  operator  with  constant  coefficients  and  then  to  the 
solution  of  some  transcendental  equation  containing  exponential  and  trigonomet¬ 
ric  functions.  The  number  of  nonzero  canonical  correlations  p*  in  this  case  is 
infinite. 

For  the  theory  of  extrapolation  of  stationary  random  processes,  another  ca-;e 
is  clearly  more  interesting.  This  is  when  x(s)  and  y(t)  are  the  same  stationary 
random  process,  but  the  sets  S  and  T  are  different:  S  is  the  past  (that  is,  either 
the  semiaxis  s  <  t  or  the  finite  interval  t  —  T  <  s  <  t),  and  T  is  the  future 
(that  is,  either  the  semiaxis  s  >  t  +  r  or  the  interval  t  +  r  <  s  <  t  +  r  +  7\, 
where  r  >  0).  Such  a  theory  of  canonical  correlations  of  two  parts  of  the  same 
stationary  random  process  was  considered  in  the  paper  [41]  for  the  case  where  S 
is  the  semiaxis  s  <  t  and  T  is  the  semiaxis  s  >  t  -f  r.  Here  the  operator  (Px  of 
the  projection  of  the  future  on  the  past  is  the  operator  which  transforms 
x(t  +  ri),  t i  >  r,  in  its  best  linear  extrapolator  x(t;  ri).  The  formulas  (13)  and 
(14)  show  that  if  the  process  x(s)  has  rational  spectral  density  of  the  form  (11), 
the  projection  (P xHy  of  the  whole  future  into  the  whole  past  is  a  finite  dimen¬ 
sional  (namely  iV-dimensional)  linear  manifold.  Consequently,  the  number  of 
nonzero  eigenvalues  of  the  operators  ($>x  and  Gf>y  in  this  case  cannot  be  more 
than  N.  In  the  Gaussian  case  it  follows  from  the  above  mentioned  phenomenon 
that  for  a  stationary  process  x(s)  with  rational  spectral  density,  all  the  informa¬ 
tion  about  the  future  contained  in  its  past  is  concentrated  in  N  special  linear 
functionals  Uh  •  •  •  ,  TTN  of  the  values  x(s),  s  <  t. 
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The  explicit  evaluation  of  the  canonical  correlations  p*  and  canonical  var¬ 
iables  Uk,  Vk  for  the  rational  spectral  densities  can  be  obtained  with  the  help 
of  a  simple  modification  of  the  conditions  (a),  (b),  and  (c)  mentioned  in  section  1 
in  connection  with  the  problem  of  the  best  linear  extrapolation.  Let  us  suppose 
that  the  spectral  density  Fix)  is  absolutely  continuous,  and  let  us  introduce  the 
functions  4>,T  (X)  and  determined  by  the  relations 

(44)  r,  =  u'Mv  (X)  dF(X),  vb  =  f~x  r'(,+T)X<t>t'  (X)  dF{X). 

Then  the  considerations  used  in  [8]  for  obtaining  the  sufficient  conditions  (a), 
(b),  and  (c)  allow  us  to  prove  the  following  statement. 

Assume  that  there  exist  functions  f~(X)  and  \p+{X)  and  a  nonnegative  number  p 
such  that: 

(a')  \p~  and  \p+  satisfy  the  conditions 

irwi V'(X)  rfx  =  |^+(x)pr(x)  rfx  =  1; 


(b')  the  f  unction  f+(X)  may  be  continued  analytically  in  the  upper  half-plane  of 
the  complex  variable  X  and  \p~(X)  may  be  continued  analytically  in  the  lower  half¬ 
plane  so  that  both  functions  will  not  have  an  order  of  growth  higher  than  a  power 
of  |X|;  and 

(c')  the  function  [eirV+(X)  —  p\f/~(X)]F'(X)  may  be  continued  analytically  in  the 
upper  half-plane  of  X,  and  the  function  [e~irX\p~(X)  —  p\f/+(X)]F'(X)  may  be  con¬ 
tinued  analytically  in  the  lower  half-plane,  so  that  both  these  functions  will  fall  off 
not  slower  than  a  power  of  \X\  at  infinity. 

Then  f  h  and  \f/~  will  be  the  functions  4 >*"  (X)  and  4>*T  (X)  corresponding  to  canonical 
variables  If  and  Vk  and  to  the  canonical  correlation  pk  =  p  (see  [41]). 

If  the  spectral  density  F'(X)  is  rational  and  has  the  form  (11),  the  stated 
conditions  may  be  satisfied  by  functions  \p~(X)  and  i^+(X)  of  the  form 


(45) 


(X) 


7  (X) 

m 

n  (X  -  0y) 


xf+(X)  = 


_ 7^(X) _ 

m  _ 

n  (x  -  &) 


3= 1  3=1 

where  y~(X)  and  7+(X)  are  polynomials  of  degree  N  —  1.  Then  the  conditions 
(a')  and  (b')  will  be  evidently  fulfilled  (after  the  normalization  of  the  coefficients 
of  7“(X)  and  7+(X)).  In  order  to  satisfy  the  condition  (c')  also  it  is  necessary  to 
select  7“(X)  and  7+(X)  in  such  a  way  that  the  functions 

M  M 

e^  n  (X  -  &b+(X)  -  p  n  (X  -  &)y-(X) 

3  =  1  j  =  1 


V 


(46) 


n  (x  —  «/,) 

k=  1 


e  ,rX  n  (x  -  j8y) 7  (X) 


M 

p  II 

;=  i 


(X  -  fij) 7+(X) 


n  (x  —  a/.) 

k=  1 


should  be  entire  functions  of  X. 
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The  Iasi  condition  leads  to  a  system  of  2N  homogeneous  linear  equations  for 
the  2 N  unknown  coefficients  of  the  polynomials  y~(X)  and  y+(A)  which  contain  p 
as  a  factor  in  certain  terms.  The  condition  of  the  existence  of  a  nonzero  solution 
of  the  system  gives  us  the  algebraic  determinantal  equation  for  p  of  degree  2N. 
The  equation  has  the  roots  p1}  •  •  •  ,  p.v  and  —  ph  ■  ■  ■  ,  —  py.  When  the  canonical 
correlations  pk  have  been  determined,  the  coefficients  of  the  polynomials  7F  (A) 
and  7*" (A)  corresponding  to  the  functions  <f\“  (A)  and  A)  can  be  found  from 
the  linear  system  involved  in  (c')  and  the  normalizing  conditions  (a')- 

Similarly,  one  may  treat  the  more  general  problem  about  the  canonical  cor¬ 
relations  and  canonical  variables  for  two  finite  parts  (a-(s),  t  —  T  <  s  <  t]  and 
{a-(s'),  ^  +  r<s'<^  +  r+  Ti)  of  the  stationary  random  process  x(s)  wTith  the 
rational  spectral  density  (11).  Here  the  operator  (Px  transforms  the  variables 
x(s'),  t  +  t  <  s'  <  /  +  r  +  7’i,  into  the  best  linear  extrapolators  in  terms  of 
the  values  x(s)  for  t  —  T  <  s  <  t.  Since  the  space  of  all  linear  functionals  of 
x(s)  where  t  —  T  <  s  <  t  for  T  <  cc  is  a  subspace  of  the  space  of  the  linear 
functionals  of  the  whole  past  of  the  process  x(s),  it  is  evident  that  the  space 
( ?xHy  cannot  be  more  than  iV-dimensional  for  T  <  =0  (cf.  equation  (29)  and  the 
statement  after  it).  It  follows  that  for  two  arbitrary  disjoint  finite  intervals  of 
the  process  x(s)  there  cannot  exist  more  than  N  nonzero  canonical  correlations. 
These  correlations  and  the  corresponding  canonical  variables  can  be  found  with 
the  help  of  the  following  modification  of  the  conditions  (a'),  (b'),  and  (c') 
mentioned  in  section  1. 

Assume  that  there  exist  functions  fi~(X)  and  ft'  (A)  and  a  nonnegative  number  p 
such  that 


(a")  |r(X)l’F'(X)rfX  =  \i+M\V'(X)  d\  -  1; 


(b")  the  functions  W(A)  and  \J/+( A)  are  entire  functions  of  A  represented  in  the 
form  iA_(A)  =  1 p\  (A)  +  e~iT'K\pz  (A)  and  f+{\)  =  fi  (A)  +  eiTlX\pf  (A)  where  the 
functions  f  d ,  \ Pt ,  and  \f/-f  are  rational;  and 

(c")  the  functions  [eirX\pf  (A)  +  e,(7+7'l)x^(A)  —  p\f/\  and  \p£ 

may  be  continued  analytically  in  the  upper  half-plane  of  A,  and  the  functions 
[g-irx^f  (x)  _|_  e~i{T+T)\p2  (A)  —  p\pt  (A)]F'(A)  and  xf/2  (A)F'(A)  may  be  continued 
analytically  in  the  lower  half-plane  so  that  all  the  functions  will  fall  off  not  slower 
than  a  power  of  |A|  at  infinity  in  the  corresponding  half-planes. 

Then  the  functions  and  will  be  the  functions  (A)  and  $it  ( A)  of 

the  equations  (44),  which  determine  the  canonical  variables  Uk  and  Vk  ( correspond¬ 
ing  to  the  canonical  correlation  p  =  pf)  of  the  parts  (x(s),  t  —  7’  <  s  <  t}  and 
{x(s')}  ^-f-r<s'<^  +  r+  Ti,  of  the  process  x(s)  with  the  spectral  density  F'(  A). 

The  proof  of  this  statement  is  similar  to  the  proof  of  conditions  (ar),  (6r), 
and  (cr)  mentioned  at  the  end  of  section  1.  For  the  rational  spectral  density  (11) 
the  conditions  may  be  satisfied  by  the  functions  of  the  form 


(47) 


(A)  = 


yt  (A) 

III  (A  -  ft-) |s 

i- 1 


fir  (A)  - 


yr  (A) 

M 


in  (A  -  ft) |2 

3  =  1 


r  =  12 
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where  7+  (X)  and  yf  (A)  are  the  polynomials  of  degree  N  +  M  —  1.  The  condi¬ 
tions  (a"),  (b/r),  and  (c")  lead  to  a  system  of  linear  homogeneous  equations 
for  the  coefficients  of  the  polynomials  7,+  (X)  and  7 r~(X).  After  eliminating  some 
unknowns  from  the  system,  it  is  possible  again  to  obtain  the  determinant al 
equation  of  degree  2 N  having  the  roots  ph  •  •  •  ,  p.y,  — pi,  •  •  •  ,  — p.y.  When  the 
canonical  correlations  pk  are  known,  the  coefficients  of  y?  (X)  and  yV  (X)  can 
be  easily  obtained  for  every  pk  from  the  system  of  linear  equations  and  the 
normalizing  conditions  (a")- 

The  best  linear  extrapolator  x(t;r )  in  terms  of  the  values  x(s)  for  s  <t  or 
t  —  T  <  s  <  t  can  always  be  decomposed  into  the  sum  of  contributions  of  dif¬ 
ferent  canonical  variables  Uk  for  the  corresponding  past  values  and  the  arbitrary 
part  of  the  future  which  contain  the  point  t  +  r  (for  example,  for  the  semiaxis 
s  >  t  +  r  or  s  >  t  -f  r0,  where  0  <  r0  <  t).  Therefore, 

(■48)  x{t\  t)  —  X  (hix(t  -j-  r)  (/ a)  •  l ' k. 

k 

Usually  in  real  situations  the  canonical  correlations  pk  are  rapidly  decreasing 
when  the  index  k  increases.  Therefore,  as  a  rule,  the  extrapolator  can  be  approx¬ 
imated  precisely  by  a  few  first  terms  in  the  right-hand  part  of  (48).  I11  the 
special  case  of  the  Ornstein-Uhlenbeck  process,  where  the  method  of  the  principal 
component  turns  out  to  be  ineffective  for  the  purpose  of  extrapolation,  the  right- 
hand  part  of  (48)  contains  only  one  term  corresponding  to  b\  =  x(t). 

In  the  more  general  case  of  the  arbitrary  rational  spectral  density  (11),  the 
right-hand  part  of  (48)  contains  a  finite  number  (namely  Ar)  terms;  however, 
all  of  them,  with  the  exception  of  one  or  two  first  terms,  are  usually  negligible. 
If  we  increase  the  length  of  the  interval  of  the  known  past  values  of  the  process, 
the  accuracy  of  the  approximate  extrapolator  containing  only  the  fixed  number 
of  right-hand  terms  in  (48)  will  be  increasing  too.  All  these  facts  display  the 
great  advantages  of  the  canonical  variables  in  comparison  with  the  principal 
components  in  studying  the  statistical  extrapolation. 
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